25 research outputs found
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
DOEF: A dynamic object evaluation framework
Abstract. In object-oriented or object-relational databases such as multimedia databases or most XML databases, access patterns are not static, i.e., applications do not always access the same objects in the same order repeatedly. However, this has been the way these databases and associated optimisation techniques like clustering have been evaluated up to now. This paper opens up research regarding this issue by proposing a dynamic object evaluation framework (DOEF) that accomplishes access pattern change by defining configurable styles of change. This preliminary prototype has been designed to be open and fully extensible. To illustrate the capabilities of DOEF, we used it to compare the performances of four state of the art dynamic clustering algorithms. The results show that DOEF is indeed effective at determining the adaptability of each dynamic clustering algorithm to changes in access pattern
Fragmenting Very Large XML Data Warehouses via K-means Clustering Algorithm
XML data sources are gaining popularity in the context of Business Intelligence and On-Line Analytical Processing (OLAP) applications, due to the amenities of XML in representing and managing complex and heterogeneous data. However, XML-native database systems currently suffer from limited performance, both in terms of volumes of manageable data and query response time. Therefore, recent research efforts are focusing on horizontal fragmentation techniques, which are able to overcome the above limitations. However, classical fragmentation algorithms are not suitable to control the number of originated fragments, which instead plays a critical role in data warehouses. In this paper, we propose the use of the K-means clustering algorithm for effectively and efficiently supporting the fragmentation of very large XML data warehouses. We complement our analytical contribution with a comprehensive experimental assessment where we compare the efficiency of our proposal against existing fragmentation algorithms